Authorship and Plagiarism Detection Using Binary BOW Features

نویسنده

  • Navot Akiva
چکیده

Identifying writing style shifts and variations are fundamental capabilities when addressing authorship related tasks. In this work we examine a simplified approach for unsupervised authorship and plagiarism detection which is based on binary bag of words representation. We evaluate our approach using PAN-2012 Authorship Attribution challenge data, which includes both open/closed class authorship identification and intrinsic plagiarism tasks. Our approach proved to be successful achieving overall average accuracy of 84% over and a 2nd place rank in the competition.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey on Authorship Analysis

The paper discusses about the problem of Authorship analysis, different types of authorship analysis’s such as authorship attribution, authorship identification, authorship profiling, plagiarism detection. It also addresses the issues in Indian language text. Keywords— Authorship attribution, authorship profiling, plagiarism detection, text classification.

متن کامل

Exploring the Intersection of Short Answer Assessment, Authorship Attribution, and Plagiarism Detection

In spite of methodological and conceptual parallels, the computational linguistic applications short answer scoring (Burrows et al., 2015), authorship attribution (Stamatatos, 2009), and plagiarism detection (Zesch and Gurevych, 2012) have not been linked in practice. This work explores the practical usefulness of the combination of features from each of these fields for two tasks: short answer...

متن کامل

Quite Simple Approaches for Authorship Attribution, Intrinsic Plagiarism Detection and Sexual Predator Identification Notebook for PAN at CLEF

Tasks such as Authorship Attribution, Intrinsic Plagiarism detection and Sexual Predator Identification are representative of attempts to deceive. In the first two, authors try to convince others that the presented work is theirs, and in the third there is an attempt to convince readers to take actions based on false beliefs or ill-perceived risks. In this paper, we discuss our approaches to th...

متن کامل

Binary Code Multi-Author Identification in Multi-Toolchain Scenarios

Knowing the authors of a binary program has significant application to forensic analysis of malicious software (malware), software supply chain risk management, and software plagiarism detection. As different compilation toolchains may generate drastically different binary code for the same source code, it is essential to be able to reliably identify authors across multiple toolchains. However,...

متن کامل

Plagiarism Detection in Programming Assignments Using Deep Features

This paper proposes a method for detecting plagiarism in source-codes using deep features. The embeddings for programs are obtained using a character-level Recurrent Neural Network (char-RNN), which is pre-trained on Linux Kernel source-code. Many popular plagiarism detection tools are based on n-gram techniques at syntactic level. However, these approaches to plagiarism detection fail to captu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012